Shareware Grab Bag

home *** CD-ROM | disk | FTP | other *** search

/ Shareware Grab Bag / Shareware Grab Bag.iso / 011 / qsort.arc / QSORT.DOC < prev

Wrap

Text File | 1985-06-30 | 14.9 KB | 320 lines

QSORT - Version 1.2 - Copyright (c) 1985 - Ben Baker QSORT may be freely copied and distributed for non- commercial purposes, provided that 1) it is distributed under the name "QSORT," and 2) this documentation file always accompanies it. Vendors wishing to distribute QSORT commercially, or with commercial products may contact the author at the address below for terms. QSORT is made available in the public domain under the "share-ware" concept. If you find this program useful, you are asked to send its author a donation commensurate with its value to you ($10 would be nice). This will encourage the development of other useful, affordable tools. Send checks to: Ben Baker P.O. Box 425 Godfrey, Il 62035 Bug reports or suggestions may be sent to the above address or via FidoNet mail to Fido node # 100/76, Ben's Bakery. (NOTE: This is NOT a BBS, but does receive FidoNet mail forwarded via Fido 100/51.) Purpose: This filter command reads text data from the standard input device, sorts it and writes it to the standard output device. Optionally, it may be used to sort a file "in place." Format: QSORT [filespec] [/R] [/Mlen] [/[L][+|-][col][:width]. . . Type: Internal External **** Remarks: Sorts are done using the ASCII collating sequence, or optionally using lexical sequence. Tab characters are not expanded with blanks. Up to 30 field parameters, /[L][+|-][col][:width] may be used to specify key fields and are ordered major to minor from left to right. The 'L' if present specifies "lexical" sequence for this field. (See discussion of lexical sequence below.) The minus (-) sign specifies reverse order for this field. The plus (+) sign (or no sign) specifies normal sort order. If present, [col] defines the beginning column of the field. If omitted, column 1 is assumed. If present, [:width] defines the field width in columns. If width is omitted, the rest of the record is assumed. If no key field parameters are given, the entire record is the key field. Any key field which falls beyond the end of a particular record is treated as a null (zero length) field for that record. The /Mlen parameter allows optionally specifying the maximum record length. It may lie in the range of /M1 to /M3600. The upper bound is necessary to permit a "worst case" merge to be done in a buffer less than 64 Kbytes in length. If this parameter is present, it must appear before any field parameters. If it is omitted, 132 is the default. NOTE: Maximum record length may be larger than any record in the file to be sorted, but excessive maximum record length will unecessarily degrade the performance of QSORT. The /R parameter is included for compatibility with DOS SORT and is redundant. It reverses the sense of sort direction for all key fields. If filespec is included in the command, any redirection specified will be ignored. Disk and path specifiers may be included in filespec and will be used for both input and output. QSORT will sort the input file defined by filespec to a file with an extent of .$$$. On successful completion of the sort, the input file is deleted and the output file is renamed to filespec, accomplishing, in effect, an "in-place" sort. The /Mlen parameter and field parameters must observe the relative ordering defined above, but filespec, redirection and/or the /R parameter may appear anywhere on the command line. Examples: Produce a directory listing sorted by creation date and time, and display it on the console a screen's worth at a time: A>DIR | QSORT /30:2 /24:5 /39 /34:5 | MORE The output of the DIR command is piped to QSORT. The key fields defined are, from left to right (major to minor), year (2 digits), month and day, AM/PM flag and time. The output of QSORT is then piped to MORE for display. Next, replace the unsorted FILE.TXT with the same data sorted in reverse order. Use columns 10 to 16 as the sort key: A>QSORT FILE.TXT /-10:7 or A>QSORT FILE.TXT /10:7 /R or (SORT compatible) A>QSORT FILE.TXT /R /+10 Finally, perform a simple sort on a file with 240 byte records (not including newline or EOL): A>QSORT LARGE.REC /M242 --- Implementation Notes --- QSORT is intended as an enhanced replacement for DOS SORT. It is nearly fully upward compatible, but provides much more flexibility. Multiple sort keys may be specified, a pseudo in-place sort may be performed and files and/or records of any size may be sorted provided only that there is sufficient disk space for work files and the output file. QSORT uses the "quick sort" algorithm, which cannot guarantee the order of records whose keys are all equal. This in the one "incompatibility" with DOS SORT, which retains the original order of records when its only key compares equal. This is important to SORT because it must be invoked multiple times to effect a multiple key sort. With QSORT you only sort once, and there are usually enough keys available to insure you get the order you want the first time. QSORT uses a sort buffer of about 60K bytes, and will fill the buffer as full as possible, and then sorts the contents of the buffer. If it has reached the end of the input file and has not yet generated any work files, the contents of the buffer are written to the output file, completing the sort operation. If the input file is too large to fit into the sort buffer, as much of the input file as possible is read into the buffer, sorted, then written to a temporary work file. This process is repeated as many times as necessary to process the entire input file, each time creating a new work file for the sorted output. Upon completion of the "sort phase," QSORT begins a "merge phase." Each work file is a sorted sub-set of the input file. Thus, work files may be read sequentially and comnbined to produce a sorted output. QSORT will open as many work files as DOS permits (more on this later). If all the remaining work files can be opened, the sorted result is written to the output file. Otherwise, a new work file is created and another merge pass will be required. On each merge pass, the number of work files is reduced and eventually all remaining work files will be opened and the sorted output file will be written completing the sort operation. QSORT is smart enough to never have just one work file remaining, which would require an unnecessary copy operation. QSORT, to work properly, needs enough space on the output disk to hold the output file. Even if the input file is to be deleted and resides in the same directory, that is not done until after the output file has been successfully written. If one merge pass is required, the default directory should have about 10% more space than the size of the input file for work files. If more than one merge pass will be required, allow about twice the size of the input file as work file space on the default disk. If QSORT has an error, it will terminate with the input file still intact, and will return to DOS with ERRORLEVEL = 1. Each time QSORT must create a new work file, the data put into it must be processed again. Obviously, the more files QSORT can open during the merge phase, the faster it can sort large files. If DOS is properly preconditioned, QSORT can have up to 15 work files open at once, and very large files can be sorted with just one sort pass and one merge pass. Unfortunately, that capability is not automatic. DOS has a fixed number of file "handles" that it associates with open files. The default number is eight, but DOS opens five of them for standard input, standard output, standard error, standard printer and standard auxiliary device. That leaves three for merging. A 250K input file would produce five work files and that would take three merge passes; merge two into one, leaving four; merge two into one leaving three; and finally merge three into the output file. Worse yet, since you need at least three handles for merging, if you have resident programs that have open files you can't merge at all! DOS can be told to set aside more space for file handles. Each handle is only 39 bytes and it's memory very well spent. One process can have a maximum of 20 handles open at one time, but since resident processes may be using handles, I recommend 25 to 35. To do this, the root directory of the DISK OR DISKS YOU BOOT FROM must contain a file named CONFIG.SYS. If your boot disk(s) already contains a CONFIG.SYS, edit it, or if not, create it to contain the following line: FILES=25 (or more) While we're at it, let's add one more thing to CONFIG.SYS which will improve the performance of QSORT and many other programs as well. DOS provides, by default, two disk buffers. These are the buffers it uses to do its disk reads and writes. During the merge phase QSORT may have many files open at once, reading from them in more or less random order. DOS may have to read the same physical sector several times to get all its data. But DOS can remember what's in each buffer and where it came from, and will not reread a sector it already has in a buffer. DOS needs 528 bytes for each buffer. I recommend 20 buffers to make QSORT perform well under the most adverse conditions. This will require an additional 9504 bytes or slightly more than 9K, again memory well spent, so we add to CONFIG.SYS the following line: BUFFERS=20 I hope you find this program useful. Your comments and suggestions are welcome. My address is at the top of this file. A planned version 2 will provide file merge and record selection capability. (Sort of a quick-and-dirty file manager utility.) Notes on Version 1.1: ----- -- ------- ---- QSORT 1.0 set an arbritrary maximum record length of 132. This should certainly be enough, right? Wrong! The program had not been released a month before I heard from a user who needed to sort 240-byte records. It occurred to me that, for any reasonable maximum length, someone would need to sort bigger records. The sort and merge subroutines need to know the maximum size record to be expected in order to manage their buffers, and an unnecessarily large limit would make them wasteful of space and degrade performance. The solution, of course, was to add the '/M' parameter, allowing a user to specify his own limit if 132 is not enough. One other minor change was made to the program. Version 1.0 documentation implied (in fact stated!) that column number could be omitted from a field parameter and would default to 1. In other words '/1:4' and '/:4' were equivalent. The program, however, would not accept the latter. Since this is a useful shorthand, the program has been changed to match the documentation. Notes on Version 1.2: ----- -- ------- ---- This version was borne out of my own need to sort files with mixed capitalization. ASCII sequence produced some bizzare results when words beginning with 'Z' sorted before those beginning with 'a.' Case-insensitive sorting isn't much better because upper and lower case get mixed randomly. The following table will illustrate what I mean: INPUT ASCII CASE LEXICAL INSENSITIVE DeLaPort Baker Baker Baker Smith Brown brown Brown brown DeAngelo bRown bRown deLaPorte DeLaPort Brown brown Deangelo Deangelo Deangelo DeAngelo deAngelo Deangelo deangelo Deangelo Brown DelaPort Deangelo Deangelo smith DelaPorte deAngelo deAngelo delaPorte Harry DeAngelo deangelo DelaPort Smith delaPort DeLaPort DeAngelo bRown DelaPort DelaPort DelaPorte brown delaPort delaPort deangelo deAngelo DeLaPort delaPort Harry deLaPorte DelaPorte DelaPorte delaPort deLaPorte deLaPorte deLaPorte Baker deangelo delaPorte deLaPorte deLaPorte delaPort deLaPorte delaPorte Deangelo delaPort Harry Harry bRown delaPorte smith Smith delaPort smith Smith smith The first column is a list of names in arbitrary order. The second is an ASCII sort of that list. Third we have one possible case-insensitive sort of the list. The fourth column is what I really wanted. It is sorted the way these words would be sorted in a dictionary (or lexicon). The lexical sort implemented in this version is achieved by making case-insensitive comparisions of entire fields. Only when a comparison of the endire field is equal is an ASCII comparison called upon to arbitrate ties. Lexical sorting can be very useful when needed, but be aware that unnecessarily specifying lexical ordering may degrade performance of QSORT. Speaking of performance, QSORT does pretty well. Matt Kanter of New York reports sorting a 650K file in 40 minutes and that ain't too shabby! Never-the-less, QSORT has been made more intelligent in the way it schedules merging of very very large files, thus improving performance when multiple merge passes are required, a feature that will go unnoticed by most users (sigh).